feat: use NucleotidePosition value object, add 0-based coordinate conversion#85
feat: use NucleotidePosition value object, add 0-based coordinate conversion#85KingKong1213 wants to merge 3 commits intomasterfrom
Conversation
…rger is already validated
simbig
left a comment
There was a problem hiding this comment.
Guter PR — Value Object für die Position ist genau richtig, gleiche Idee wie Chromosome. Würde ich gern approven, aber die Validierungsgrenze hat sich von < 1 auf < 0 verschoben und das ändert leider das Koordinatensystem. length() rechnet end - start + 1 (1-based closed), Position 0 führt da zu off-by-one. Details + Lösungsvorschlag im Inline-Kommentar.
src/NucleotidePosition.php
Outdated
| public function __construct($positionAsMixed) | ||
| { | ||
| $position = SafeCast::toInt($positionAsMixed); | ||
| if ($position < 0) { |
There was a problem hiding this comment.
Die Grenze hat sich hier von < 1 (in GenomicPosition/GenomicRegion vorher) auf < 0 verschoben. Damit wird Position 0 valide, aber das bricht die 1-based Annahme die durch den Rest vom Code geht:
// length() rechnet 1-based closed: end - start + 1
$region = new GenomicRegion(new Chromosome('chr7'), new NucleotidePosition(0), new NucleotidePosition(100));
$region->length(); // → 101, aber 0-based half-open wären 100 BasenAuch parse() unterstützt g.-Prefix (HGVS) — da ist g.0 per Definition ungültig.
Falls BED-Support der Treiber ist: BED ist ein I/O-Format, kein internes Koordinatensystem. Am saubersten als Konvertierung an der Grenze — gleicher Ansatz wie htsjdk/pysam:
| if ($position < 0) { | |
| if ($position < 1) { |
BED-Support z.B. als Factory auf GenomicRegion:
public static function fromBed(string $chromosome, int $start, int $end): self
{
// BED 0-based half-open → 1-based closed
return new self(
new Chromosome($chromosome),
new NucleotidePosition($start + 1),
new NucleotidePosition($end),
);
}Testcase dazu:
public function testFromBedConvertsToOneBased(): void
{
// BED: chr7 55249070 55249171 (EGFR Exon 19, 0-based half-open)
$region = GenomicRegion::fromBed('chr7', 55249070, 55249171);
// Intern 1-based closed: chr7:55249071-55249171
self::assertSame(55249071, $region->start);
self::assertSame(55249171, $region->end);
self::assertSame(101, $region->length());
}NucleotidePosition must reject position 0 to stay consistent with the 1-based closed coordinate system used by length(), containsPosition(), intersection() and HGVS g. notation. Add GenomicRegion::fromZeroBasedHalfOpen() and toZeroBasedHalfOpen() for BED/BAM/bigWig interoperability: convert at the I/O boundary, work internally in 1-based closed coordinates. 🤖 Generated with Claude Code
simbig
left a comment
There was a problem hiding this comment.
Fixes are in: NucleotidePosition enforces >= 1, fromZeroBasedHalfOpen()/toZeroBasedHalfOpen() für BED/BAM/bigWig. Tests grün, PHPStan sauber.
Introduces
NucleotidePositionvalue object (same pattern asChromosome) to centralize position validation. Constructors ofGenomicPositionandGenomicRegionnow acceptNucleotidePositioninstead ofint.Adds
GenomicRegion::fromZeroBasedHalfOpen()/toZeroBasedHalfOpen()for BED/BAM/bigWig interoperability. Internally everything stays 1-based closed.Changes
NucleotidePosition: value object wrappingSafeCast::toInt()+ validation (>= 1)GenomicPosition: constructor acceptsNucleotidePositioninstead ofintGenomicRegion: constructor acceptsNucleotidePositioninstead ofintGenomicRegion::fromZeroBasedHalfOpen(): factory for 0-based half-open coordinates (BED, BAM, bigWig)GenomicRegion::toZeroBasedHalfOpen(): serialize back to 0-based half-openBreaking changes
None.
GenomicPositionandGenomicRegionwere introduced in v6.2.0 (2 weeks ago) and are not yet widely adopted.